In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on the other hand, typically ignore the individualized information. To fill the gap, in this paper, we initialize the study of HTE estimation in panel data. Under different assumptions for HTE identifiability, we propose the corresponding heterogeneous one-side and two-side synthetic learner, namely H1SL and H2SL, by leveraging the state-of-the-art HTE estimator for non-panel data and generalizing the synthetic control method that allows flexible data generating process. We establish the convergence rates of the proposed estimators. The superior performance of the proposed methods over existing ones is demonstrated by extensive numerical studies.
translated by 谷歌翻译
Latent factor model estimation typically relies on either using domain knowledge to manually pick several observed covariates as factor proxies, or purely conducting multivariate analysis such as principal component analysis. However, the former approach may suffer from the bias while the latter can not incorporate additional information. We propose to bridge these two approaches while allowing the number of factor proxies to diverge, and hence make the latent factor model estimation robust, flexible, and statistically more accurate. As a bonus, the number of factors is also allowed to grow. At the heart of our method is a penalized reduced rank regression to combine information. To further deal with heavy-tailed data, a computationally attractive penalized robust reduced rank regression method is proposed. We establish faster rates of convergence compared with the benchmark. Extensive simulations and real examples are used to illustrate the advantages.
translated by 谷歌翻译
激光镜头和相机是两个用于自动驾驶中3D感知的互补传感器。激光点云具有准确的空间和几何信息,而RGB图像为上下文推理提供了纹理和颜色数据。为了共同利用激光雷达和相机,现有的融合方法倾向于基于校准,即一对一的映射,将每个3D点与一个投影图像像素对齐。但是,这些方法的性能高度依赖于校准质量,这对传感器的时间和空间同步敏感。因此,我们提出了一个动态的交叉注意(DCA)模块,具有新型的一对一的交叉模式映射,该模块从初始投影对邻域的最初投影中学习了多个偏移,从而发展了对校准误差的耐受性。此外,提出了A \ textIt {动态查询增强}来感知与模型无关的校准,从而进一步增强了DCA对初始未对准的耐受性。名为“动态跨注意网络”(DCAN)的整个融合体系结构利用了多级图像特征,并适应了点云的多个表示,这使DCA可以用作插件融合模块。对Nuscenes和Kitti的广泛实验证明了DCA的有效性。拟议的DCAN在Nuscenes检测挑战上优于最先进的方法。
translated by 谷歌翻译
高质量数据在确保政策评估的准确性方面起着核心作用。本文启动了针对强盗政策评估的高效和安全数据收集的研究。我们提出问题并研究其几种代表性变体。对于每个变体,我们分析其统计属性,得出相应的勘探策略,并设计用于计算它的有效算法。理论分析和实验都支持所提出方法的有用性。
translated by 谷歌翻译
乘车共享公司等双面市场通常涉及一组跨时间和/或位置做出顺序决策的主题。随着智能手机和物联网的快速发展,它们实质上改变了人类的运输格局。在本文中,我们考虑了乘车共享公司的大规模车队管理,这些公司涉及随着时间的推移接收产品(或治疗)序列的不同领域的多个单元。在这些研究中出现了主要的技术挑战,例如政策评估,因为(i)空间和时间附近会导致位置和时间之间的干扰; (ii)大量位置导致维度的诅咒。为了同时解决这两个挑战,我们介绍了在这些研究中进行政策评估的多机构增强学习(MARL)框架。我们提出了新的估计量,即在不同产品下的平均结果,尽管州行动空间具有很高的差异性。提出的估计量在模拟实验中有利。我们进一步说明了我们的方法使用从双面市场公司获得的真实数据集来评估应用不同的补贴策略的效果。我们提出的方法的Python实现可在https://github.com/runzhestat/causalmarl上获得。
translated by 谷歌翻译
实现通用语言情报是自然语言处理的长期目标,标准评估基准发挥基本和指导作用。我们认为,对于通用语言智能评估,基准本身需要全面和系统。为此,我们提出了Cuge,一种中文语言理解和生成评估基准,具有以下特征:(1)分层基准框架,其中数据集主要选择和组织语言能力 - 任务数据集层次结构。 (2)多级评分策略,其中基于分层框架提供了不同级别的模型性能。为了促进CUGE,我们提供了一个公共排行榜,可以自定义,以支持灵活的模型判断标准。代表性预先训练的语言模型的评估结果表明了对通用语言智能的完善的充足空间。 Cuge在Cuge.baai.ac.cn上公开提供。
translated by 谷歌翻译
最近,通过单一或多个表示提出了许多方法,以提高点云语义分割的性能。但是,这些作品在性能,效率和记忆消耗中没有保持良好的平衡。为了解决这些问题,我们提出了Drinet ++,通过增强点云的点云与Voxel-Point原理来扩展Drinet。为了提高效率和性能,Drinet ++主要由两个模块组成:稀疏功能编码器和稀疏几何功能增强。稀疏特征编码器提取每个点的本地上下文信息,稀疏几何特征增强功能通过多尺度稀疏投影和细心的多尺度融合增强了稀疏点云​​的几何特性。此外,我们提出了在培训阶段的深度稀疏监督,以帮助收敛并减轻内存消耗问题。我们的Drinet ++在Semantickitti和Nuscenes数据集中实现了最先进的户外点云分段,同时运行得更快,更耗费较少的内存。
translated by 谷歌翻译
We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.
translated by 谷歌翻译
In this paper we explore the task of modeling (semi) structured object sequences; in particular we focus our attention on the problem of developing a structure-aware input representation for such sequences. In such sequences, we assume that each structured object is represented by a set of key-value pairs which encode the attributes of the structured object. Given a universe of keys, a sequence of structured objects can then be viewed as an evolution of the values for each key, over time. We encode and construct a sequential representation using the values for a particular key (Temporal Value Modeling - TVM) and then self-attend over the set of key-conditioned value sequences to a create a representation of the structured object sequence (Key Aggregation - KA). We pre-train and fine-tune the two components independently and present an innovative training schedule that interleaves the training of both modules with shared attention heads. We find that this iterative two part-training results in better performance than a unified network with hierarchical encoding as well as over, other methods that use a {\em record-view} representation of the sequence \cite{de2021transformers4rec} or a simple {\em flattened} representation of the sequence. We conduct experiments using real-world data to demonstrate the advantage of interleaving TVM-KA on multiple tasks and detailed ablation studies motivating our modeling choices. We find that our approach performs better than flattening sequence objects and also allows us to operate on significantly larger sequences than existing methods.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译